Custom Content Identifier Rules
The Content matching rules page under Preferences provides a unified interface for you to define content inspection rules.
Creating Custom Rules
To create a custom rule,
- On the Rules page, click on Custom Rule.
- In the Create New Rule pop-up window, enter a name, and description.
- In the XML Rule editor, write the rule to define the matching criteria. See, Rule Editor to understand the rule elements.
- Click Save.
Rule Editor
Using the XML-based rule editor, you can create custom rules that define pattern elements and specific criteria used to match content.
The rule consists of the following two sections.
- Entities
- Keywords and Regex
Here is a breakdown of the rule.
Entity
The Entity object contains the attributes and pattern definitions that specify the content-matching criteria. An entity must contain at least one pattern definition
<Rules>
<Entity patternsProximity="100" recommendedConfidence="85">
<Pattern confidenceLevel="85">
<IdMatch idRef="num_string">
<Match idRef="words" minOccurs="1" />
</Pattern>
</Entity>
<!-- Entity is required and must contain at least one Pattern child -->
<!-- patternsProximity: The number of characters that the Entity's child/inner Pattern elements will consider surrounding content as corroborative evidence. Must be 1 or more. -->
<!-- recommendedConfidence: The confidence threshold (1 - 100) to consider the Entity as matching -->
<!-- confidenceLevel: The confidence (1 - 100) that this Pattern contributes to the parent Entity's overall confidence level if it matches. -->
<!-- idRef is required and should reference Keyword id or Regex id -->
<!-- The match won't be included in the match results; instead, it serves as supplementary evidence -->
Entity Elements
| Element | Sub Element | Description |
|---|---|---|
| patternsProximity | The range of characters around the content to support the pattern identification. For example, identify patterns where a company name is found within a proximity of 100 characters from a numerical pattern such as 9999.99. | |
| recommendedConfidence | The minimum threshold percentage that the combined "confidenceLevel" of all the patterns must meet or exceed to be considered a match. For example, if the recommendedConfidence is 85%, then the combined confidenceLevel of all the patterns must be equal to or greater than 85% to identify the pattern found in the content as a match. Depending on how critical it is to correctly identify the pattern, you can assign a higher recommendedConfidence value. | |
| Pattern | ||
| confidenceLevel | An estimated percentage value assigned based on the reliability of the pattern. For example, the confidence level of identifying a numerical value such as "9999.9999" is higher when compared to a word such as "transaction" in the right context | |
| IdMatch | variable used to reference the primary keyword or regular expression that should be matched. | |
| Match | References a supplementary keyword or regular expression to provide supporting evidence and increase confidence in pattern identification. |
Keywords and Regex
In the second section of the rule, you define the primary and supplementary keywords and regular expressions to be matched.
<Keyword id="words"\>
<Group matchStyle="word">
<Term>google</Term>
<Term>facebook</Term>
</Group>
</Keyword>
<Regex id="phone_number">d{10}</Regex>
<Keyword id="num_string">
<Group matchStyle="numerical_string">
<Term>9999.9999</Term>
</Group\>
</Keyword>
</Rules>
Keyword & Regex Elements
| Element | Description |
|---|---|
| Keyword | |
| Keyword id | A unique ID for this keyword group. For example, "words", "num_string", etc. These must not repeat in a rule |
| Group | |
| Group matchStyle | Specifies the type of string to be matched. For example, "letter", "word", "phrase", "symbol", "numerical_string", etc. Term: Individual terms within the keyword group that should be matched. |
| Group Term | One or more terms that are intended to be matched in the keywword match |
| Regex id | A unique ID for this regular expression. |
| Regex | The regular expression that defines a content pattern. For example, "d10" is a regular expression to match a sequence of 10 digits such as a phone number. |
Example 1
In this example, we created a rule to identify a specific transaction involving a company name such as "Google" or "Facebook" and a transaction value of "9999.9999". In this rule, the numerical value is the primary keyword, and the company name is a supplementary keyword.
<Rules>
\<\!-- Entity is required and must contain at least one Pattern child \--\>
\<\!-- patternsProximity: The number of characters that the Entity's child/inner Pattern elements will consider surrounding content as corroborative evidence. Must be 1 or more. \--\>
\<\!-- recommendedConfidence: The confidence threshold (1 \- 100\) to consider the Entity as matching \--\>
\<Entity patternsProximity="100" recommendedConfidence="85"\>
\<\!-- confidenceLevel: The confidence (1 \- 100\) that this Pattern contributes to the parent Entity's overall confidence level if it matches. \--\>
\<Pattern confidenceLevel="85"\>
\<\!-- idRef is required and should reference Keyword id or Regex id \--\>
\<IdMatch idRef="num\_string" /\>
\<\!-- The match won't be included in the match results;
instead, it serves as supplementary evidence \--\>
\<Match idRef="words" minOccurs="1" /\>
\</Pattern\>
\</Entity\>
\<\!-- The 'id' should be unique per rule; the IDs of both Keywords and Regex should not repeat. \--\>
\<Keyword id="words"\>
\<Group matchStyle="word"\>
\<\!-- Match either of these strings \--\>
\<Term\>google\</Term\>
\<Term\>facebook\</Term\>
\</Group\>
\</Keyword\>
\<\!-- Perl regular expression syntax \--\>
\<Regex id="phone\_number"\>\\d{10}\</Regex\>
\<Keyword id="num\_string"\>
\<Group matchStyle="numerical\_string"\>
\<Term\>9999.9999\</Term\>
\</Group\>
\</Keyword\>
\</Rules\>
Example 2
In this example, we created a rule to use regular expressions to identify keyword patterns to match specific user names and email addresses.
| XML | Copy |
| :---- | :---: |
| \<Rules\> \<\!-- Entity for detecting specific keywords or email addresses \- \-\> \<Entity id="Keyword\_Detection" patternsProximity="100" recommendedConfidence="85"\> \<\!-- Pattern to detect specific keywords or email addresses \--\> \<Pattern confidenceLevel="90"\> \<IdMatch idRef="regex\_keywords" /\> \</Pattern\> \</Entity\> \<\!-- Regex for matching specific keywords or email addresses \--\> \<Regex id="regex\_keywords"\> \\b(?:JDoe|john\\.doe@acme\\.com|JSmith|jane@acme\\.com|MBrown|michael\\. brown@acme\\.com)\\b \</Regex\> \</Rules\> | |